information insider What's Next for Text: Retrieval Trends Past, Present, and Push |
![]() ![]() Robert J. Boeri and Martin Hensel |
Once hard to install and arguably harder to use, text retrieval systems have metamorphosed into collaborative, comprehensive, continuous, conceptual, and even cheap searching solutions. |
About five years ago, purchasing a text retrieval tool meant investing in long pilots of insular systems costing $100,000 or more. And none of these systems could claim anything resembling "right out of the box" ease and readiness of use. Every system required painstaking customization. Getting documents into the system often required using scanners and error-prone optical character recognition software. And back then, you had to be careful that you didn't let an ASCII document get into a collection of Word or WordPerfect documents or the daily indexing batch process would fail. The Internet was familiar only to defense contractors and academics, and electronic publishing systems such as Corel's Envoy or Adobe Acrobat remained niche tools, if for no other reason than that they were used by so few.
Accessing a retrieval system across a network was difficult and often very slow, and you needed dedicated IS support to keep the system running. In short, early 1990s-era content-based retrieval used a "pull" model: If you wanted information, you requested it from the system by issuing a query. You then pulled (received) a list of relevant documents in response. Systems were closed, costly, and complex, and they delivered only when explicitly asked.
Flash forward to 1997 and behold the "push" model. Text-retrieval systems have metamorphosed into collaborative, comprehensive, continuous, conceptual, and even cheap searching solutions. Given the sweep of evolutionary change and the transformations it's demanded, it's a wonder any of the original big-three vendors--Verity, Fulcrum, and Excalibur--survived the geotectonic shifts.
Yet survive they did, and those and other text retrieval tools have gone from specialty items to mainstream corporate tools and even desktop commodities. Fulcrum has become the vendor of choice for Microsoft's Exchange. Excalibur and Yahoo! have struck technology agreements. And Verity has aggressively and explicitly pursued general ubiquity for its Topic system. Moreover, these systems have not only grown to accommodate the Web, but they are being profoundly influenced by many Web metaphors--from their user interfaces to their ability to work with and on the World Wide Web.
The way text retrieval's "Big Three" have evolved and continue to evolve provides a fitting window on where text retrieval in general has been and is going.
It was Yahoo!, the Web search service, which first organized searching into conceptual groups like entertainment and sports. Such groupings not only made searching easier but quicker as well, since the engine behind the search had fewer items to process in each request.
Creating the conceptual groups carries its own attendant difficulties, however, and deciding which group or groups a document belongs to for every document in a database can be even trickier. Verity's Search'97 product suite-- using its new 2.0 search engine--can examine documents and automatically categorize them. Find a document you like using the Web Excite search service, and you can ask it to search for more of the same. A similar facility is in the Verity 2.0 engine, making conceptual searching easier.
Another visual metaphor recently proposed comes from Verity. At its recent customer-based Verity Interchange conference, Verity unveiled a graphics-based approach to searching and finding text-based information. Although still in the research phase, conceptual clusters of subject matter can be expressed as two-dimensional maps; clusters appear as cities. The larger the city, the more comprehensive the category; the closer the clusters, the more related they are. Search results are placed on the map, and if they are nearer to some clusters, this gives an immediate sense of their relationship to various concept groupings.
Information collections have always needed to be current, but another Web metaphor has extended the notion of continuous information from a pull to a push model. The Web pioneer which made this model best known is PointCast, a free online news service. Fill out a questionnaire detailing your interests, and each time you log onto the Internet a tailored electronic newspaper is delivered to you. Verity's Search'97 product line similarly allows both retrospective pull searching and real-time profiling and indexing for push queries. And it does this while offering the same document and query analysis tools and user interface. The system acts the same, and achieves the same result, whether you run the query or an active agent does the searching for you. Your agent working for you or fading into the background: this is the ultimate in continuous collaboration.
Fortunately, comprehensiveness is another trend that has emerged in 1997. Much corporate information is found in SQL databases, for example, and Oracle Corporation recently announced shipment of its ConText search system. This system allows full-text searching in a relational database, and search results can even be summarized via Structured Query Language (SQL). ConText also allows searching on Intranets and client/server networks, and, of course, searches in ConText can be performed in a variety of western languages and Japanese.
Verity's Search'97 product line achieves similar comprehensiveness via filters and vendor partnerships. Using Inso and Mastersoft viewer-filter technologies, Verity achieves the ability to index, search, and display virtually any contemporary binary format. Partnerships with SoftQuad will provide zone searching in HTML and the generalized native SGML. Lexical analysis in Search'97 is enhanced by partnerships with Xerox and Inso, and Asian partners will provide lexical technologies for managing Asian languages. Lest we forget the freely available search option, you can search Acrobat PDF files in many western and other languages, from the desktop to the Web.
With all the advancements in text retrieval systems to date, challenges remain. Areas where late-breaking improvements are showing up or future changes are needed include speed of performance, which Verity, for example, has improved by hiring Web Search programmers to optimize code. Other challenges include indexing increasingly massive document collections, ease of constructing queries or instructing search agents, and the ability to sift through massive document infobases and deliver the best possible answers to our questions. In short, the search for text retrieval goes on.
Robert J. Boeri and Martin Hensel are columnists for Information Insider. Boeri is Advanced Systems Specialist in the Information Services Division of Factory Mutual Engineering of Norwood, Massachusetts. Hensel is founder of Martin Hensel Corporation, a Newton, Massachusetts-based consulting firm that builds SGML-based editorial and production systems for publishers, corporations, interactive services, and compositors.
Comments? Email us at letters@onlineinc.com.
![]() Home Page |
![]() |
![]() |
![]() |
![]() |
Copyright © 1997, Online Inc. All rights reserved.
info@onlineinc.com
[This site created for best results under Netscape.]